Daily Weather Data

Daily Weather Data

In this article, we analyze a weather dataset from Kaggle.com.

Data description from Kaggle:

Loading the Dataset

Preprocessing

Imputing Missing Values:

Problem Description

Let's set Relative Humidity (Afternoon) as the target variable. This means given the dataset and using the rest of the features, we would like to know whether is humid or not at 3 PM. In doing so, we can consider the median of Relative Humidity (Afternoon). Then, assign 1 to values over or equal the median value, and 0 to values under the median value.

Features with high variance

Moreover, high variance for some features can hurt our modeling process. For this reason, we would like to standardize features by removing the mean and scaling to unit variance. In this article, we demonstrated the benefits of scaling data using StandardScaler().


References

  1. Kaggle Dataset: Daily Weather Dataset